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We present a temperature-independent Monte Carlo 
method for the determination of the density of states of lattice 
proteins that combines the fast ground-state search strategy 
of the nPERM chain growth and multicanonical reweighting 
for sampling the complete energy space. Since the density of 
states contains all energetic information of a statistical sys- 
tem, we can directly calculate the mean energy, specific heat, 
Gibbs free energy, and entropy for all temperatures. We ap- 
ply this method to HP lattice proteins and for the examples 
of sequences considered, we identify the transitions between 
native, globule, and random coil states. Since no special prop- 
erties of heteropolymers are involved in this algorithm, the 
method applies to polymer models as well. 



The simulation of protein folding is extremely challeng- 
ing, since the interactions between the constituents of the 
macromolecule and the influence of the environment re- 
quire sophisticated models. One of the most essential 
aspects in the description of the folding process is the 
formation of a compact core of hydrophobic amino acid 
residues (H) which is screened from water by hydrophilic 
or polar residues (P). This characteristic property of re- 
alistic proteins can be qualitatively studied with simple 
lattice models such as the HP model [1]. By taking into 
account the attractive interaction between hydrophobic 
monomers only, the energy of a lattice protein with cer- 
tain conformation and sequence is calculated as follows: 
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where < symbolises that the sum is taken only 

over nearest lattice neighbours being nonadjacent along 
the self-avoiding chain of monomers. If the ith monomer 
is hydrophobic, ct^ = 1, while for a polar monomer ct^ = 0. 

As it is one of the main goals in off-lattice simulations 
to find low-lying energy states within a rough free energy 
landscape, good lattice folders are expected to have low- 
degenerated ground states. Much work has been done on 
identifying designing sequences with such native states. 
Ground-state search strategies on three-dimensional lat- 
tices range, for example, from enumeration [2,3] over 
hydrophobic core construction [4,5] and contact interac- 
tion [6] to chain growth methods [7-10]. Low- lying en- 
ergy states for HP sequences with up to 136 monomers 
were identified with these methods. 



In contrast, there were only few attempts to study 
the thermodynamic properties of the HP model in three 
dimensions [11]. The main reason is that conventional 
Monte Carlo methods like Metropolis sampling, but also 
more sophisticated methods like simulated [12] and paral- 
lel [13] tempering as well as histogram reweighting Monte 
Carlo algorithms such as multicanonical sampling [14] 
or the Wang-Landau method [15] expose problems in 
tackling "hidden" conformational barriers in combination 
with chain update moves which become usually ineffi- 
cient at low temperatures, where many attempted moves 
are rejected due to the self- avoidance constraint. One 
possibility to update the conformation is to apply move 
sets. Widely used sets usually consist of operations that 
change a single bond (end flips), two bonds (corner flips), 
three (crankshaft) or even more bonds, and pivot rota- 
tions, where the ith monomer serves as pivot point and 
one of the two partial chains connected with it is rotated 
about any axis through the pivot. Unfortunately, the 
more dense the conformation, the more inefficient this 
procedure becomes. 

Alternatively, it is possible to let the polymer grow, 
i.e., the nth monomer is placed at a randomly chosen 
next-neighbour site of the {n — l)th monomer (n < N 
with N being the total length of the polymer). If this 
new site is already occupied, the entire chain would have 
to be discarded to obtain correct statistics. This simple 
chain growth is not efficient, since the number of dis- 
carded chains grows exponentially with the chain length. 
Rosenbluth chain growth [16] avoids occupied neighbours 
at the expense of a bias, since the probability of such 
a chain is Pn ~ (nr=2'^') ^' where m; is the num- 
ber of free lattice sites to place the Ith monomer. This 
bias is balanced out by assigning each conformation a 
Rosenbluth weight ^ p'-^. Chain growth meth- 

ods with population control such as PERM (Pruned- 
Enriched Rosenbluth Method) [8,9] and its recent modi- 
fications nPERMjl [10] improve this procedure consider- 
ably by utilising the counterbalance between Rosenbluth 
weight and Boltzmann probability. The weight factor 
is therefore replaced by 
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where Ei is the energy of the partial chain X; — 
(xi, . . . ,Xi) created with Rosenbluth chain growth and 
T is the temperature. 

To explain the main ideas, we shall confine ourselves 
for the moment to the original PERM formulation [8], 
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where the sample of chains of length n is enriched by 
making identical copies once W^^^^ is bigger than a 
certain threshold value . In this case, the weight 
lyPERM jg divided among the clones. For W^^^^ being 
smaller than a lower bound W^, the chain is pruned 
with probability 1/2 and the weight of a surviving chain 
is doubled. The partition sum is proportional to the sum 
of weight factors (2) for the conformations X„^t of length 
n sampled at "time" t, 



(3) 



The PERM algorithms are very successful as ground- 
state searchers and the canonical distribution at a given 
temperature T is well reproduced over some orders of 
magnitude, but states that are highly suppressed at this 
temperature are not hit in a reasonable time. Standard 
reweighting techniques are applicable only in a small re- 
gion around T. Thus, recording temperature-dependent 
quantities such as the specific heat requires simulations 
at different temperatures. 

As the partition sum of a polymer or a het- 
eropolymer with fixed sequence can be expressed 
in terms of the density (or degeneracy) of states 
g{E), Z = E{x}e-^^(«) = Y.i9{Ei)e-^^^ {P ^ 
l/ksT), all energetic quantities such as the mean 
energy {E){T) = —{d/d(3)\nZ, the specific heat 
Cv{T) = {{E^) - {E}^) /ksT^, Gibbs free energy 
F{T) = -keTlnZ, and entropy S(T) = {{E) - F) /T 
can directly be calculated if the density of states is 
known. These quantities are of particular interest, since 
they are indicators of temperature-dependent conforma- 
tional transitions. 

Our method allows within one simulation the direct 
sampling of the density of states g{E) over the entire 
range of the energy space with probabilities ranging over 
many orders of magnitude, due to combining the advan- 
tages of energetic flat histogram reweighting at infinite 
temperature and chain growth. The flat distribution can 
then be reweighted to any desired temperature. Rare, 
i.e. low-lying energy states are also hit and therefore the 
low-temperature behaviour of the polymer can be repro- 
duced well, in particular the low-temperature transition 
between compact globules and ground states of lattice 
proteins with low ground-state degeneracy. Using the 
HP model, we applied the method to lattice proteins 
with more than 40 monomers and different ground-state 
degeneracies and found for examples with low ground- 
state degeneracy pronounced low-temperature peaks in 
the specific heat indicating ground-state - globule tran- 
sitions. Since our method is completely general, it is also 
applicable to other polymer models. 

In order to achieve a flat distribution of energetic 
states using chain growth, we introduce into the parti- 
tion sum (3) an additional weight W^^^{En{^n)) that 
depends on the energy En of a given conforination — 
(xi,...,x„): 
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Since the histograms at all intermediate stages of the 
chain growth process are required to be flat, the new 
reweighting factor is rewritten in product form and we 

have 
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with Wf^'' = 1. The PERM weight factors (2) lead to 

a canonical distribution P^™^^(i?„) which shall be de- 
formed to a constant distribution P^^^''^{En) over the 
entire energy space. This requires the weights W^*^* to 
be proportional to the inverse of the canonical distribu- 
tion, VK"*"- ^ 1 / P^'^'^''^ (En) , a condition that can obvi- 
ously only be satisfied iteratively [14]. As we are mainly 
interested in the density of states, which is proportional 
to the canonical probability distribution at = l/ksT = 
0, gn{En) ^ P^^^-°°{En), wc wiU effectively perform 
the simulation at infinite temperature. Consequently, 
W^„^"* ~ ^l9n{En) and Z„ ~ Et 5n(i;„(X„,t))W^n(X„,t). 
Here we have introduced the combined weight 
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which can also be written recursively, W„ = 

Wri-imn.9,7^(-^n)/.9n-i(-^"-i)- Thc Canonical mean 
value of any quantity 0„(X„) is then calculated as fol- 
lows: 
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where we have used mean values according to the flat dis- 
tribution (. . .)flat = Et • • ■ WniXn,t)/Zt' with the par- 
tition sum of the flat distribution Z^^^ = Et Wn{'Xn,t)- 
The most important technical part of the algorithm is 
the determination of the weig hts VK^^S since they are 
directly connected with the desired densities of states 
gn{E). As the weights are completely unknown in the 
beginning, wc evaluate them iteratively, starting from 

unity, Wf'"'^°^(£;) = 1 (2 < n < iV) for ah values of 
E. This means that the zeroth iteration is a pure chain 
growth run without reweighting. 

Each time, a chain of length n with energy E is created, 
the corresponding histogram value hn{E) is increased by 
the weight Wn of the chain. If a certain number of chains 
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with total length N was produced, the iteration is fin- 
ished and the new weie are determined 
by calculating Wn'"''^'\E) = Wn'"''^"~^\E)/hn{E), 2 < 
n < N. Before the next iteration starts, the histogram 
is reset, hn{E) — 0. The correct weights are found, 
when hn{E) is "flat", i.e., it has approximately the same 
value for all energies. In our actual implementation we 
employed a suitably adapted multicanonical variant of 
nPERMis {new PERM with importance sampling) [10]. 

Before we present results obtained with the new algo- 
rithm, we discuss general properties of heteropolymers 
exemplified for 14mers which can still be analysed by ex- 
act enumeration. Among all 2^^ sequences for 14mers 
there is only one designing sequence, i.e. a sequence to 
which a unique ground state belongs (up to a reflection 
symmetry) . 
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14.1 = HPHPH2PHPH2P2H (designing) ^S:^^ 

14.2 = H2P2HPHPH2PHPH - 

14.3 = H2PHPHP2HPHPH2 

14.4 = H2PHP2HPHPH2PH 
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FIG. 1. Specific heat for four 14mers with different se- 
quences. Only the 14mer with the native ground state shows 
a pronounced low-temperature peak indicating a transition 
between ground-states and globules. 

We compared thermodynamic properties of four 
14mers with different sequences but same hydrophobic- 
ity (nH — 8) and identical lowest energy (i?niin = —8). 
Figure 1 shows the speciflc heat for the different 14mers. 
A pronounced low-temperature peak that indicates the 
transition between ground states and compact globule 
states is only observed for the 14mer with the native 
ground state, 14.1. These results show qualitatively how 
the conformational transitions depend on the ground- 
state degeneracy go of the polymer. For the sequences 
14.2 and 14.3 it is twice that of the designing sequence, 
and 14.4 is even four times higher degenerate. 

The first example to which we applied our multicanon- 
ical chain growth algorithm is a 42mer with the se- 
quence PH2PHPH2PHPHP2H3PHPH2PHPH3P2HPHP- 
H2PHPH2P whose ground-state properties have similar- 
ities with the parallel /3-helix of pectate lyase C [17]. The 
lattice model with only four-fold ground-state degener- 
acy has a ground-state energy of i?min = —34. In order 
to investigate the low-temperature behaviour of this sys- 
tem it is necessary that the algorithm correctly samples 
the low-energy states and that it also hits the ground 
states. The measured density of states ranges over about 



25 orders of magnitude and covers the entire energy space 
[—34, 0], as shown in Fig. 2. 




FIG. 2. Density of states of the 42mer, normalised to unity, 
i.e. E,5(S0 = 1- 
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FIG. 3. (a) Specific heat and mean energy as functions 
of the temperature for the 42mer. The ground-state - 
globule and the globule - random coil transition occur at 
To « 0.27 and T\ « 0.53, respectively, (b) Reduced free 
energy F(T) = F(T) ^ TSo and entropy S{T) = S{T) - Sq. 

Figure 3(a) shows the results for the heat capacity 
and the mean energy. Writing out the raw energies 
and weights from the simulation, we analysed the data 
and calculated the statistical error by means of the jack- 
knife blocking method. Due to the low degeneracy of 
the ground states, the transition between native states 
and globule states is very pronounced and occurs at a 
temperature Tq w 0.27. The globule - random coil tran- 
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sition at Ti « 0.53, on the other hand, is rather weak. 
This confirms the results of Ref. [11]. From the density 
of states, we also obtain the Gibbs free energy F{T) and 
the entropy S{T). Since the ground-state degeneracy go 
is not accessible by stochastic search algorithms in gen- 
eral, we have plotted in Fig. 3(b) the reduced free energy 
F{T) = F{T) + TSo and the entropy S{T) = S{T) - So, 
up to the constant 5*0 = IngiQ, where go is the degeneracy 
of the ground state. 

Finally, similar to the consideration of the 14mers, we 
compare two 48mers with different ground-state prop- 
erties. The first one, which we denote by 48.1 has 
the sequence PHPH2PH6P2HPHP2HPH2PHPHP3HP2- 
H2P2H2P2HPHP2HP and its ground state with the en- 
ergy —34 is 5000-fold degenerate. The ground state of the 
other 48mcr (48.2) with the sequence HPH2P2H4PH3P2- 
H2P2HPH3PHPH2P2H2P3HP8H2 has the much higher 
degeneracy of 1.5 x 10^ and possesses the energy —32 [5]. 
As is demonstrated in Fig. 4, we also observe for these 
longer chains that the conformational transition between 
the lowest-energy states and the globules is stronger the 
lower the ground-state degeneracy is. 
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FIG. 4. Specific heat and mean energy for the two 48mers 
with different ground-state properties. The low-temperature 
conformational transition is more pronounced for the example 
48.1 with lower ground-state degeneracy. 

In conclusion, we have developed a multicanonical 

chain growth algorithm that allows the simulation of the 
thermodynamic properties of polymers and heteropoly- 
mers. It is based on energetic flat histogram sampling of 
the density of states in combination with PERM chain 
growth. We applied this algorithm to hctcropolymcrs 
with more than 40 monomers and obtained accurate den- 
sities of states over about 25 orders of magnitude that 
cover the entire energy range, thus yielding very good 
results for all derived energetic quantities such as mean 
energy, specific heat, free energy, and entropy. In partic- 
ular, this enabled us to determine the low-temperature 
behaviour of the systems with high precision and to ob- 
serve pronounced low-temperature peaks of the specific 
heat for lattice proteins with low ground-state degener- 
acy indicating the ground-state - globule transition. 
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